Supporting Tabular Data Characterization in a Large Scale Data Infrastructure by Lexical Matching Techniques

نویسندگان

  • Leonardo Candela
  • Gianpaolo Coro
  • Pasquale Pagano
چکیده

Digital Libraries continue to evolve towards research environments supporting access and management of multiform Information Objects spread across multiple data sources and organizational domains. This evolution has introduced the need to deal with Information Objects having traits different from those characterizing Digital Libraries at their early stages and to revise the services supporting their management. Tabular data represent a class of Information Objects that require to be efficiently managed because of their core role in many eScience scenarios. This paper discusses the tabular data characterization problem, i.e., the problem of identifying the reference dataset of any column of the dataset. In particular, the paper presents an approach based on lexical matching techniques to support users during the data curation phase by providing them with a ranked list of reference datasets suitable for a dataset column.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Intelligent System’s Approach for Revitalization of Brown Fields using only Production Rate Data

State-of-the-art data analysis in production allows engineers to characterize reservoirs using production data. This saves companies large sums that should otherwise be spend on well testing and reservoir simulation and modeling. There are two shortcomings with today’s production data analysis: It needs bottom-hole or well-head pressure data in addition to data for rating reservoirs’ characteri...

متن کامل

Centralized Clustering Method To Increase Accuracy In Ontology Matching Systems

Ontology is the main infrastructure of the Semantic Web which provides facilities for integration, searching and sharing of information on the web. Development of ontologies as the basis of semantic web and their heterogeneities have led to the existence of ontology matching. By emerging large-scale ontologies in real domain, the ontology matching systems faced with some problem like memory con...

متن کامل

FLOPPIES: A Framework for Large-Scale Ontology Population of Product Information from Tabular Data in E-commerce Stores

With the vast amount of information available on the Web, there is an urgent need to structure Web data in order to make it available to both users and machines. E-commerce is one of the areas in which growing data congestion on the Web impedes data accessibility. This paper proposes FLOPPIES, a framework capable of semi-automatic ontology population of tabular product information from Web stor...

متن کامل

Leipzig Corpus Miner - A Text Mining Infrastructure for Qualitative Data Analysis

This paper presents the “Leipzig Corpus Miner”—a technical infrastructure for supporting qualitative and quantitative content analysis. The infrastructure aims at the integration of “close reading” procedures on individual documents with procedures of “distant reading”, e.g. lexical characteristics of large document collections. Therefore information retrieval systems, lexicometric statistics a...

متن کامل

Large Deformation Characterization of Mouse Oocyte Cell Under Needle Injection Experiment

In order to better understand the mechanical properties of biological cells, characterization and investigation of their material behavior is necessary. In this paper hyperelastic Neo-Hookean material is used to characterize the mechanical properties of mouse oocyte cell. It has been assumed that the cell behaves as continuous, isotropic, nonlinear and homogenous material for modeling. Then, by...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012